Project-Team:WILLOW

Inria | Raweb 2018 | Presentation of the Project-Team WILLOW | WILLOW Web Site


	PDF	e-Pub

Previous |

Home | Next next

Section: New Software and Platforms

Mixture-of-Embedding-Experts

Keyword: Computer vision

Functional Description: Joint understanding of video and language is an active research area with many applications. Prior work in this domain typically relies on learning text-video embeddings. One difficulty with this approach, however, is the lack of large-scale annotated video-caption datasets for training. To address this issue, we aim at learning text-video embeddings from heterogeneous data sources. To this end, we propose a Mixture-of-Embedding-Experts (MEE) model with ability to handle missing input modalities during training. As a result, our framework can learn improved text-video embeddings simultaneously from image and video datasets. We also show the generalization of MEE to other input modalities such as face descriptors.

Participants: Ivan Laptev and Josef Sivic
Contact: Antoine Miech
Publication: Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
URL: https://www.di.ens.fr/willow/research/mee/

Previous |

Home | Next next